69 research outputs found

    Dissecting Arbitrary-scale Super-resolution Capability from Pre-trained Diffusion Generative Models

    Full text link
    Diffusion-based Generative Models (DGMs) have achieved unparalleled performance in synthesizing high-quality visual content, opening up the opportunity to improve image super-resolution (SR) tasks. Recent solutions for these tasks often train architecture-specific DGMs from scratch, or require iterative fine-tuning and distillation on pre-trained DGMs, both of which take considerable time and hardware investments. More seriously, since the DGMs are established with a discrete pre-defined upsampling scale, they cannot well match the emerging requirements of arbitrary-scale super-resolution (ASSR), where a unified model adapts to arbitrary upsampling scales, instead of preparing a series of distinct models for each case. These limitations beg an intriguing question: can we identify the ASSR capability of existing pre-trained DGMs without the need for distillation or fine-tuning? In this paper, we take a step towards resolving this matter by proposing Diff-SR, a first ASSR attempt based solely on pre-trained DGMs, without additional training efforts. It is motivated by an exciting finding that a simple methodology, which first injects a specific amount of noise into the low-resolution images before invoking a DGM's backward diffusion process, outperforms current leading solutions. The key insight is determining a suitable amount of noise to inject, i.e., small amounts lead to poor low-level fidelity, while over-large amounts degrade the high-level signature. Through a finely-grained theoretical analysis, we propose the Perceptual Recoverable Field (PRF), a metric that achieves the optimal trade-off between these two factors. Extensive experiments verify the effectiveness, flexibility, and adaptability of Diff-SR, demonstrating superior performance to state-of-the-art solutions under diverse ASSR environments

    Spatio-Temporal Calibration for Omni-Directional Vehicle-Mounted

    Full text link
    We present a solution to the problem of spatio-temporal calibration for event cameras mounted on an onmi-directional vehicle. Different from traditional methods that typically determine the camera's pose with respect to the vehicle's body frame using alignment of trajectories, our approach leverages the kinematic correlation of two sets of linear velocity estimates from event data and wheel odometers, respectively. The overall calibration task consists of estimating the underlying temporal offset between the two heterogeneous sensors, and furthermore, recovering the extrinsic rotation that defines the linear relationship between the two sets of velocity estimates. The first sub-problem is formulated as an optimization one, which looks for the optimal temporal offset that maximizes a correlation measurement invariant to arbitrary linear transformation. Once the temporal offset is compensated, the extrinsic rotation can be worked out with an iterative closed-form solver that incrementally registers associated linear velocity estimates. The proposed algorithm is proved effective on both synthetic data and real data, outperforming traditional methods based on alignment of trajectories

    Research for Inertia Response and Primary Frequency Regulation Ability of Wind Turbine

    Get PDF
    [Introduction] Large-scale connection of wind power to the power grid poses great challenges to the stability (especially frequency stability) of grid operation.In order to solve the problem of inadequate frequency regulation capability caused by large-scale connection of wind power to the power grid and improve the frequency adaptability of wind power grid connection, wind turbines need to have frequency regulation function and response timeliness. [Method] This paper adopted a frequency regulation system scheme based on rotor kinetic energy and pitch angle reserve, which could provide active support for the power grid quickly and accurately during the power grid frequency change. Firstly, the main control algorithm was designed based on the theoretical analysis of inertia response and primary frequency regulation algorithm logic. Then, the functional verification was carried out on the co-simulation platform. Finally, the actual test was carried out in a project.[Result] The simulation and test results showed that the frequency regulation system scheme based on rotor kinetic energy and pitch angle reserve could cope with a variety of grid frequency changes and quickly provided active support. [Conclusion] The frequency regulation system scheme of wind turbines can perform a fast inertia response (with the response time less than 500 ms) and primary frequency regulation response (with the response time less than 5 s) under various frequency change conditions and provide active support for the power grid, which can help recover the grid frequency and effectively improve the frequency adaptability of wind turbines

    Uremia toxin helps to induce inflammation in intestines by activating the ATM/NEMO/ NF-B signalling pathway in human intestinal epithelial cells

    Get PDF
    638-642During progressive chronic kidney disease, toxic substances known as uremic toxins accumulate in body fluids. Uremia toxin has been documented to be involved in most inflammatory reactions, and indoxyl-sulfate (IS) a major serum metabolite of uremia is a key player in this. The mechanism by which uremia toxin establishes it inflammatory activity is scarcely known; however, researchers believes that a clear understanding of this process can serve as a guide to combat the situation. The study was designed to investigate the role played by uremia toxin in intestinal inflammation. SW480 was used as cell lines for this study. Luciferase assay was used to detect the cell viability of different concentrations of IS. RT-qPCR was used to detect the effect of IS on the expression of inflammatory factors. The comet assay was used as a tool to detect DNA damage. Western blot was used to detect the phosphorylation level of ATM/NEMO/NF-kB protein. The IS of 0.09 nM was determined to be the best experimental concentration by luciferase assay. Result showed that IS promotes the expression of inflammatory factors TNF-α and IL-6. In addition, IS led to enhanced DNA damage in cells. IS promoted ATM phosphorylation leading to phosphorylation of NEMO to activate the NF-kB signalling pathway. In conclusion, uremia toxin facilitates inflammation in intestines by activating the ATM/NEMO/ NF-kB signalling pathway in human intestinal epithelial cells

    Uremia toxin helps to induce inflammation in intestines by activating the ATM/NEMO/NF-kB signalling pathway in human intestinal epithelial cells

    Get PDF
    During progressive chronic kidney disease, toxic substances known as uremic toxins accumulate in body fluids. Uremia toxin has been documented to be involved in most inflammatory reactions, and indoxyl-sulfate (IS) a major serum metabolite of uremia is a key player in this. The mechanism by which uremia toxin establishes it inflammatory activity is scarcely known; however, researchers believes that a clear understanding of this process can serve as a guide to combat the situation. The study was designed to investigate the role played by uremia toxin in intestinal inflammation. SW480 was used as cell lines for this study. Luciferase assay was used to detect the cell viability of different concentrations of IS. RT-qPCR was used to detect the effect of IS on the expression of inflammatory factors. The comet assay was used as a tool to detect DNA damage. Western blot was used to detect the phosphorylation level of ATM/NEMO/NF-kB protein. The IS of 0.09 nM was determined to be the best experimental concentration by luciferase assay. Result showed that IS promotes the expression of inflammatory factors TNF-α and IL-6. In addition, IS led to enhanced DNA damage in cells. IS promoted ATM phosphorylation leading to phosphorylation of NEMO to activate the NF-kB signalling pathway. In conclusion, uremia toxin facilitates inflammation in intestines by activating the ATM/NEMO/ NF-kB signalling pathway in human intestinal epithelial cells

    LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

    Full text link
    We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal. Our novel, training-free approach utilizes Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model. In the proposed method, Whisper functions as the "ear" by transcribing the audio, while GPT-4 serves as the "brain," acting as an annotator with a strong performance for contextualized output selection and correction. Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English and can effectively transcribe lyrics across multiple languages. Furthermore, we use LyricWhiz to create the first publicly available, large-scale, multilingual lyrics transcription dataset with a CC-BY-NC-SA copyright license, based on MTG-Jamendo, and offer a human-annotated subset for noise level estimation and evaluation. We anticipate that our proposed method and dataset will advance the development of multilingual lyrics transcription, a challenging and emerging task.Comment: 9 pages, 2 figures, 5 tables, accepted by ISMIR 202

    On the Effectiveness of Speech Self-supervised Learning for Music

    Full text link
    Self-supervised learning (SSL) has shown promising results in various speech and natural language processing applications. However, its efficacy in music information retrieval (MIR) still remains largely unexplored. While previous SSL models pre-trained on music recordings may have been mostly closed-sourced, recent speech models such as wav2vec2.0 have shown promise in music modelling. Nevertheless, research exploring the effectiveness of applying speech SSL models to music recordings has been limited. We explore the music adaption of SSL with two distinctive speech-related models, data2vec1.0 and Hubert, and refer to them as music2vec and musicHuBERT, respectively. We train 1212 SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech. However, we identify the limitations of such existing speech-oriented designs, especially in modelling polyphonic information. Based on the experimental results, empirical suggestions are also given for designing future musical SSL strategies and paradigms

    MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

    Full text link
    Self-supervised learning (SSL) has recently emerged as a promising paradigm for training generalisable models on large-scale data in the fields of vision, text, and speech. Although SSL has been proven effective in speech and audio, its application to music audio has yet to be thoroughly explored. This is primarily due to the distinctive challenges associated with modelling musical knowledge, particularly its tonal and pitched characteristics of music. To address this research gap, we propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training. In our exploration, we identified a superior combination of teacher models, which outperforms conventional speech and audio approaches in terms of performance. This combination includes an acoustic teacher based on Residual Vector Quantization - Variational AutoEncoder (RVQ-VAE) and a musical teacher based on the Constant-Q Transform (CQT). These teachers effectively guide our student model, a BERT-style transformer encoder, to better model music audio. In addition, we introduce an in-batch noise mixture augmentation to enhance the representation robustness. Furthermore, we explore a wide range of settings to overcome the instability in acoustic language model pre-training, which allows our designed paradigm to scale from 95M to 330M parameters. Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attains state-of-the-art (SOTA) overall scores. The code and models are online: https://github.com/yizhilll/MERT

    Robust Visual Compass Using Hybrid Features for Indoor Environments

    No full text
    Orientation estimation is a crucial part of robotics tasks such as motion control, autonomous navigation, and 3D mapping. In this paper, we propose a robust visual-based method to estimate robots’ drift-free orientation with RGB-D cameras. First, we detect and track hybrid features (i.e., plane, line, and point) from color and depth images, which provides reliable constraints even in uncharacteristic environments with low texture or no consistent lines. Then, we construct a cost function based on these features and, by minimizing this function, we obtain the accurate rotation matrix of each captured frame with respect to its reference keyframe. Furthermore, we present a vanishing direction-estimation method to extract the Manhattan World (MW) axes; by aligning the current MW axes with the global MW axes, we refine the aforementioned rotation matrix of each keyframe and achieve drift-free orientation. Experiments on public RGB-D datasets demonstrate the robustness and accuracy of the proposed algorithm for orientation estimation. In addition, we have applied our proposed visual compass to pose estimation, and the evaluation on public sequences shows improved accuracy

    A Hypergraph Matching Labeled Multi-Bernoulli Filter for Group Targets Tracking

    No full text
    corecore